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A brief description of the Model of the World Economy 
implemented at the Institute for Economic Analysis is 
presented, together with our experience in converting the 
software to vector code. 

For each time period, the model is reduced to a linear 
system of over 2000 variables. The matrix of coefficients 
has a bordered block diagonal structure, and we show how som 
of the matrix operations can be carried out on all diagonal 
blocks at once. 

We present some other details of the algorithms and 
report running times. 


279 



1 . Description of the Model 


The first input-output model of the world economy was 
originally developed for the United Nations by Leontief, Carter 
and Petri [1977] as a tool for evaluating alternative long-term 
economic policies. The most recent version that has been 
implemented spans the period 1970-2030 in 10-year intervals. 

The model is dynamic in the sense that the solution for each 
10-year period requires information obtained from the solution 
for the previous period. In this paper we focus on the solution 
of a single time period. 

In the current version of the model, the world is divided 
into 16 regions (r=16) and for each of the regions the detailed 
economic activities are described by a set of linear algebraic 
equations of the form 

A i£i + S i w = 0 (i=l,...,r). (1) 

The components of the vectors correspond to levels of 
domestic production, imports, and exports of goods and ser- 
vices, and so on, for each region, and w is the vector of 
total world exports. In addition there are global constraints 
described by the equation 

r 

E G-; y i = 0 (2 ) 

i = l JL 

which imposes the consistency among regional trade relations. 

A more detailed description of the model can be found in 
Leontief, Carter and Petri [1977], Duchin and Szyld [1979], and 
Szy Id [1981]. 
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All the matrices involved are very sparse. For example 
could be 200 x 250 with 2500 nonzeros, 
could be 200 x 50 with 50 nonzeros. 

Gi could be 50 x 250 with 100 nonzeros. 

Each matrix A^ has more columns than rows and therefore some 
components of ^ have to be prescribed. 

If are the vectors of unknown components of and 
and Ei are the corresponding submatrices of A^ and G^, the whole 
model for a single time period can be regarded as a linear 
system of equations of over 3000 variables with a nonsymmetric 
bordered block diagonal matrix of coefficients of the form: 
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where the blank blocks in the matrix are zero blocks. 

When the model was first implemented, the program for 
the solution of (3) inverted the matrices and stored the 
inverses. The approximate computer time to perform this task 
was 4 hours on a PDP-11. The (dense) inverses were saved for 
subsequent runs during which they were updated depending on 
the components of ^ prescribed and on changes in the. matrices 
Aj[ . Each of these subsequent runs required 110 seconds on an 
IBM 370 for each time period. 

The set of prescribed components of and the matrices 
are used to determine a scenario, i.e., a set of economic 
assumptions. Studies carried out with the World Model compare 
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results of different scenarios, i.e., the implications of the 
different assumptions. The consequences of the introduction 
of new technologies, different development strategies, or 
shifts in trade patterns are among the numerous scenarios that 
can be analyzed. Thus, the World Model is a flexible tool to 
analyze alternative policies. Several large scale empirical 
studies have been carried out with this model. The most recent 
ones are reported in Leontief and Duchin [1983], Leontief and 
Solm [1982], Leontief, Koo, Nasar and Sohn [1983] and Leontief, 
Mariscal and Sohn [1982]. 

To make this tool much more flexible we needed to greatly 
reduce the computational resources required to run a scenario. 

A first step in that direction was the application of sparse 
matrix techniques for the solution of (3). In the present 
implementation the matrices A^ are stored using a sparse 
scheme, i.e., only the nonzero elements are stored, together 
with some integer arrays indicating their locations. A single 
array of approximate length 3200 contains all vectors )c^ , i=l,...r. 
Other such arrays contain the vectors b_j_ , the nonzero values 
of the matrices and Gj_, or other data objects. Similarly, 
objects like the nonzeros of the matrices Mj_ appear in single 
arrays of length close to 5000. 

2 . Method of Solution 

The algorithmic details of the solution of (3) are given in 
Duchin and Szyld [1979], Szyld [1981], and Furlong and Szyld 
[1982]. Here we enumerate the operations for the solution 
of (3) very schematically. 
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loop 

1. 

For i»l , . . . ,r 



1.1. 

Read and the prescribed elements of 


1.2. 

Produce Mi,E^ and b^ 



1.3. 

Obtain factorization of 


loop 

2. 

For i*l , . . . , r 



2.1. 

Prepare different right hand sides with 

columns of S^ 


2.2. 

Solve systems with matrix M^ 


loop 

3. 

Obtain w 


loop 

4. 

For i“l , . . . , r 



4.1. 

Compute bi - S^w 



4.2. 

Solve = bj_ - S^w 



The 

factorization of the matrices Mi (in step 

1.3) and the 


solution of several linear systems with them (in steps 2.2 and 
4.2) are performed with routines from the MA28 set developed 
by Duff [1977] . 

We report the running times for a single time period with 
this method of solution without any vector code in Table 1. 


Table 1 . 


System/compiler options 

CPU sec. 

IBM 370/168 

1 

LO 

03 

IBM 3033 

~20 

Cyber 205, no options 

11.46 

Cyber 205, vectorization by the compiler 

9.04 
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Architectural features combined with the sparse matrix 
techniques resulted in running times three to ten times faster 
than the 110 seconds that subsequent runs required after compu- 
tation of the inverses in the first implementation of the 
World Model. The goal is now to obtain vector code for the 
Cyber 205 that will further reduce the overall running time. 

3 . Code vectorization 

The redesign of the World Model software for its efficient 
use on the Cyber 205 was conceived in three phases: 

I. Elementary operations over all regions 

II. The MA28 package inner loops 

III. New concepts for MA28 

Phase I consists essentially of the vectorization of all 
operations except those associated with the factoring of the 
matrices M^ and solutions of • the corresponding linear systems. 
Those operations correspond to steps 1.2, 2.1, and 4.1. Each 
of these steps has a different structure but they all are 
loops operating on vectors of length about 200, inside another 
loop of length 16. The basic idea was to split the outer loop 
and perform simultaneously the operations on all vectors of the 
different regions, i.e., on vectors of length of about 3200. 
Cyber 205 FORTRAN commands such as scatter, gather and bit 
operations were used throughout. 

We illustrate the vectorization of step 4.1. The length 
of w is about 50. Sj[ is a rectangular matrix of about 200 rows, 
with only one nonzero entry per column. It is stored as a 
vector with an accompanying integer array indicating in which 
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row each nonzero entry lies. The following FORTRAN statements 

are part of sequential code for step 4.1. 

DO 100 11=1, NREG 
IBEG= ( II-l ) *NTRADE 
IBEGB=IPNTB( II )-l 
DO 50 1=1, NTRADE 

INDEX=KTRDBG { IBEG+Il +IBEGB 

B( INDEX) =B( INDEX )-EXPSH( I+IBEG) *W( I ) 

50 CONTINUE 
100 CONTINUE 

The running time for these loops was 1008 nsec. Different vec- 
torization options were analyzed. One of them consisted of 
scattering the vectors that contain the nonzero values of S£ 
and w to vectors of length of about 3200 and then performing 
the triad operation. This required 9514 clock cycles, or about 
190 ysec. The version adopted performs the multiplication of 
the vectors containing the nonzeros of and w first, a 
vector operation of length about 800, scatters that vector and 
performs the final subtraction in 7250 clock cycles or 145 usee, 
a gain of a factor of 7 from— the sequential code. 

Similar gains have been achieved in the other portions of 
the code vectorized in phase I. Unfortunately only a small 
portion of the total running time of the World Model is spent 
in the code vectorized in phase I. Thus the overall gain was 
relatively small. 

About 30% of the total running time of the World Model is 
spent on routines of the MA28 package in which the matrices M ^ 
are factored (step 1.3), and solutions with many right hand 
sides computed (steps 2.2 and 4.2). At the present time we 
have completed only part of phase II, the vectorization of 
some of the inner loops in the MA28 set. 
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Due to the startup time in any vector operation, it is 
common practice to look into the length of the vectors involved 
in the operation to decide if the vectorization is really worth- 
while. In codes for sparse matrices, the vector length for an 
operation is usually the number of nonzero elements in a particular 
row or column, and thus varies within the code. The technique 
used in this case is to assess if the vector length is above 

a particular value and branch the process of that particular row 

or column to vector or sequential code. The running time of the 

code incorporating these features is 7.33 CPU seconds, cf. 

Table 1. 

Phase III, not yet implemented, consists of reconceptualizing 
the MA28 set. We will investigate the possibility of solving 
several right hand sides simultaneously, as well as other features 
like special treatment of right hand sides with few nonzero 
elements . 
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